发票 OCR 实战:10 个常见错误及解决方案
大家好,我是正在实战各种 AI 项目的程序员晚枫。
😭 这些错误你肯定遇到过
3 个月来,处理了 10 万 + 张发票。
踩过的坑,比你吃过的盐都多。
今天:总结 10 个最常见错误及解决方案。
错误 1:密钥失效
报错:AuthenticationError: Invalid credentials
原因:
解决:
1 2 3 4 5 6 7 8 9
| import os print(os.getenv('TENCENT_SECRET_ID'))
os.environ['TENCENT_SECRET_ID'] = '新密钥'
|
错误 2:网络超时
报错:TimeoutError: Request timeout
原因:
解决:
1 2 3 4 5 6 7 8 9 10
| def recognize_with_retry(file_path, max_retries=3): for i in range(max_retries): try: return poocr.ocr2excel.VatInvoiceOCR2Excel(...) except TimeoutError: if i < max_retries - 1: time.sleep(5) else: raise
|
错误 3:图片格式不支持
报错:ValueError: Unsupported image format
原因:
解决:
1 2 3 4 5 6 7 8 9
| from PIL import Image
def convert_image(image_path): img = Image.open(image_path) if img.format not in ['JPEG', 'PNG', 'PDF']: img = img.convert('RGB') img.save(image_path + '.jpg', 'JPEG') return image_path + '.jpg'
|
错误 4:图片太大
报错:FileSizeExceeded: Image too large
原因:图片超过 API 限制(通常 10MB)
解决:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| from PIL import Image
def compress_image(image_path, max_size=5): img = Image.open(image_path) file_size = os.path.getsize(image_path) / 1024 / 1024 if file_size > max_size: img.save(image_path, quality=80, optimize=True) return image_path
|
错误 5:识别结果为空
报错:无报错,但返回空结果
原因:
解决:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| def check_image_quality(image_path): img = cv2.imread(image_path) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) if np.mean(gray) < 80: return False, "图片太暗" if cv2.Laplacian(gray, cv2.CV_64F).var() < 100: return False, "图片模糊" return True, "合格"
|
错误 6:字段缺失
报错:KeyError: '发票代码'
原因:OCR 识别失败,字段不存在
解决:
1 2 3 4
| invoice_code = invoice_data.get('发票代码', '') if not invoice_code: print("⚠️ 发票代码识别失败,需要人工复核")
|
错误 7:Excel 写入失败
报错:PermissionError: File is in use
原因:Excel 文件被占用
解决:
1 2 3 4 5 6 7 8 9 10 11 12
| def is_file_in_use(file_path): try: with open(file_path, 'r+') as f: pass return False except PermissionError: return True
while is_file_in_use(output_path): time.sleep(1)
|
错误 8:编码错误
报错:UnicodeDecodeError: 'gbk' codec can't decode
原因:文件编码问题
解决:
1 2 3 4 5 6 7 8 9 10 11 12
| with open(file_path, 'r', encoding='utf-8') as f: content = f.read()
for encoding in ['utf-8', 'gbk', 'gb2312']: try: with open(file_path, 'r', encoding=encoding) as f: content = f.read() break except: continue
|
错误 9:路径错误
报错:FileNotFoundError: No such file or directory
原因:路径不存在或拼写错误
解决:
1 2 3 4 5 6 7
| import os abs_path = os.path.abspath(file_path)
if not os.path.exists(abs_path): print(f"❌ 文件不存在:{abs_path}")
|
错误 10:API 额度超限
报错:QuotaExceeded: Monthly quota exceeded
原因:超出免费额度
解决:
1 2 3 4 5 6 7 8 9 10 11
| def check_quota(): response = requests.get( 'https://console.cloud.tencent.com/ocr/quota', headers={'Authorization': get_auth_token()} ) quota = response.json() print(f"剩余额度:{quota['remaining']}") if quota['remaining'] < 100: print("⚠️ 额度不足,请充值")
|
📊 错误处理最佳实践
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| def robust_invoice_recognition(file_path): """健壮的发票识别""" try: if not os.path.exists(file_path): return None, "文件不存在" is_ok, msg = check_image_quality(file_path) if not is_ok: return None, msg result = poocr.ocr2excel.VatInvoiceOCR2Excel(...) if not result.get('发票代码'): return None, "识别失败" return result, "成功" except Exception as e: return None, f"异常:{str(e)}"
|
💬 联系我
主营业务:AI 编程培训、企业内训、技术咨询
🎓 推荐课程
错误不可怕,可怕的是不知道如何解决。
这份指南,希望能帮你少踩坑。
遇到问题,别慌,按上面的方法试试。💪
🎓 AI 编程实战课程
想系统学习 AI 编程?程序员晚枫的 AI 编程实战课 帮你从零上手!